{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "3o8Qof7Cy165"
   },
   "source": [
    "# LAB 3c:  BigQuery ML Model Deep Neural Network.\n",
    "\n",
    "## Learning Objectives\n",
    "\n",
    "1. Create and evaluate DNN model with BigQuery ML.\n",
    "1. Create and evaluate DNN model with feature engineering with ML.TRANSFORM.\n",
    "1. Calculate predictions with BigQuery's ML.PREDICT.\n",
    "\n",
    "\n",
    "## Introduction \n",
    "In this notebook, we will create multiple deep neural network models to predict the weight of a baby before it is born, using first no feature engineering and then the feature engineering from the previous lab using BigQuery ML.\n",
    "\n",
    "We will create and evaluate a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculate predictions with BigQuery's ML.PREDICT. If you need a refresher, you can go back and look how we made a baseline model in the notebook [BQML Baseline Model](../solutions/3a_bqml_baseline_babyweight.ipynb) or how we combined linear models with feature engineering in the notebook [BQML Linear Models with Feature Engineering](../solutions/3b_bqml_linear_transform_babyweight.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hJ7ByvoXzpVI"
   },
   "source": [
    "## Load necessary libraries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "mC9K9Dpx1ztf"
   },
   "source": [
    "Check that the Google BigQuery library is installed and if not, install it. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 609
    },
    "colab_type": "code",
    "id": "RZUQtASG10xO",
    "outputId": "5612d6b0-9730-476a-a28f-8fdc14f4ecde"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "google-cloud-bigquery==1.6.1\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "pip freeze | grep google-cloud-bigquery==1.6.1 || \\\n",
    "pip install google-cloud-bigquery==1.6.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "clnaaqQsXkwC"
   },
   "source": [
    "## Verify tables exist\n",
    "\n",
    "Run the following cells to verify that we have previously created the dataset and data tables. If not, go back to lab [1b_prepare_data_babyweight](../solutions/1b_prepare_data_babyweight.ipynb) to create them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>weight_pounds</th>\n",
       "      <th>is_male</th>\n",
       "      <th>mother_age</th>\n",
       "      <th>plurality</th>\n",
       "      <th>gestation_weeks</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [weight_pounds, is_male, mother_age, plurality, gestation_weeks]\n",
       "Index: []"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "-- LIMIT 0 is a free query; this allows us to check that the table exists.\n",
    "SELECT * FROM babyweight.babyweight_data_train\n",
    "LIMIT 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>weight_pounds</th>\n",
       "      <th>is_male</th>\n",
       "      <th>mother_age</th>\n",
       "      <th>plurality</th>\n",
       "      <th>gestation_weeks</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [weight_pounds, is_male, mother_age, plurality, gestation_weeks]\n",
       "Index: []"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "-- LIMIT 0 is a free query; this allows us to check that the table exists.\n",
    "SELECT * FROM babyweight.babyweight_data_eval\n",
    "LIMIT 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model 4:  Increase complexity of model using DNN_REGRESSOR\n",
    "\n",
    "DNN_REGRESSOR is a new regression model_type vs. the LINEAR_REG that we have been using in previous labs.\n",
    "\n",
    "* MODEL_TYPE=\"DNN_REGRESSOR\"\n",
    "\n",
    "* hidden_units: List of hidden units per layer; all layers are fully connected. Number of elements in the array will be the number of hidden layers. The default value for hidden_units is [Min(128, N / (𝜶(Ni+No)))] (1 hidden layer), with N the training data size, Ni, No the input layer and output layer units, respectively, 𝜶 is constant with value 10. The upper bound of the rule will make sure the model won’t be over fitting. Note that, we currently have a model size limitation to 256MB.\n",
    "\n",
    "* dropout: Probability to drop a given coordinate during training; dropout is a very common technique to avoid overfitting in DNNs. The default value is zero, which means we will not drop out any coordinate during training.\n",
    "\n",
    "* batch_size: Number of samples that will be served to train the network for each sub iteration. The default value is Min(1024, num_examples) to balance the training speed and convergence. Serving all training data in each sub-iteration may lead to convergence issues, and is not advised."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create DNN_REGRESSOR model\n",
    "\n",
    "Let's train a DNN regressor model in BQ using `MODEL_TYPE=DNN_REGRESSOR` with 2 hidden layers with 64 and 32 neurons each (`HIDDEN_UNITS=[64, 32]`) and a batch size of 32 (`BATCH_SIZE=32`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "CREATE OR REPLACE MODEL\n",
    "    babyweight.model_4\n",
    "OPTIONS (\n",
    "    MODEL_TYPE=\"DNN_REGRESSOR\",\n",
    "    HIDDEN_UNITS=[64, 32],\n",
    "    BATCH_SIZE=32,\n",
    "    INPUT_LABEL_COLS=[\"weight_pounds\"],\n",
    "    DATA_SPLIT_METHOD=\"NO_SPLIT\") AS\n",
    "\n",
    "SELECT\n",
    "    weight_pounds,\n",
    "    is_male,\n",
    "    mother_age,\n",
    "    plurality,\n",
    "    gestation_weeks\n",
    "FROM\n",
    "    babyweight.babyweight_data_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "AVPXGKZ374v7"
   },
   "source": [
    "### Get training information and evaluate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first look at our training statistics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>training_run</th>\n",
       "      <th>iteration</th>\n",
       "      <th>loss</th>\n",
       "      <th>eval_loss</th>\n",
       "      <th>learning_rate</th>\n",
       "      <th>duration_ms</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1.738382</td>\n",
       "      <td>2.713294</td>\n",
       "      <td>0.1</td>\n",
       "      <td>644721</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.405464</td>\n",
       "      <td>2.604605</td>\n",
       "      <td>0.1</td>\n",
       "      <td>606809</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.732892</td>\n",
       "      <td>2.641865</td>\n",
       "      <td>0.1</td>\n",
       "      <td>599237</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1.014293</td>\n",
       "      <td>2.191636</td>\n",
       "      <td>0.1</td>\n",
       "      <td>618610</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.001003</td>\n",
       "      <td>2.943201</td>\n",
       "      <td>0.1</td>\n",
       "      <td>830424</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   training_run  iteration      loss  eval_loss  learning_rate  duration_ms\n",
       "0             0          4  1.738382   2.713294            0.1       644721\n",
       "1             0          3  2.405464   2.604605            0.1       606809\n",
       "2             0          2  0.732892   2.641865            0.1       599237\n",
       "3             0          1  1.014293   2.191636            0.1       618610\n",
       "4             0          0  1.001003   2.943201            0.1       830424"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.model_4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's evaluate our trained model on our eval dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_absolute_error</th>\n",
       "      <th>mean_squared_error</th>\n",
       "      <th>mean_squared_log_error</th>\n",
       "      <th>median_absolute_error</th>\n",
       "      <th>r2_score</th>\n",
       "      <th>explained_variance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0.803307</td>\n",
       "      <td>1.065722</td>\n",
       "      <td>0.018036</td>\n",
       "      <td>0.658235</td>\n",
       "      <td>0.387528</td>\n",
       "      <td>0.390166</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
       "0             0.803307            1.065722                0.018036   \n",
       "\n",
       "   median_absolute_error  r2_score  explained_variance  \n",
       "0               0.658235  0.387528            0.390166  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.model_4,\n",
    "    (\n",
    "    SELECT\n",
    "        weight_pounds,\n",
    "        is_male,\n",
    "        mother_age,\n",
    "        plurality,\n",
    "        gestation_weeks\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use our evaluation's `mean_squared_error` to calculate our model's RMSE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rmse</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>1.032338</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       rmse\n",
       "0  1.032338"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    SQRT(mean_squared_error) AS rmse\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.model_4,\n",
    "    (\n",
    "    SELECT\n",
    "        weight_pounds,\n",
    "        is_male,\n",
    "        mother_age,\n",
    "        plurality,\n",
    "        gestation_weeks\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final Model:  Apply the TRANSFORM clause\n",
    "\n",
    "Before we perform our prediction, we should encapsulate the entire feature set in a TRANSFORM clause as we did in the last notebook. This way we can have the same transformations applied for training and prediction without modifying the queries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's apply the TRANSFORM clause to the final model and run the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: []\n",
       "Index: []"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "CREATE OR REPLACE MODEL\n",
    "    babyweight.final_model\n",
    "\n",
    "TRANSFORM(\n",
    "    weight_pounds,\n",
    "    is_male,\n",
    "    mother_age,\n",
    "    plurality,\n",
    "    gestation_weeks,\n",
    "    ML.FEATURE_CROSS(\n",
    "        STRUCT(\n",
    "            is_male,\n",
    "            ML.BUCKETIZE(\n",
    "                mother_age,\n",
    "                GENERATE_ARRAY(15, 45, 1)\n",
    "            ) AS bucketed_mothers_age,\n",
    "            plurality,\n",
    "            ML.BUCKETIZE(\n",
    "                gestation_weeks,\n",
    "                GENERATE_ARRAY(17, 47, 1)\n",
    "            ) AS bucketed_gestation_weeks)\n",
    "    ) AS crossed)\n",
    "\n",
    "OPTIONS (\n",
    "    MODEL_TYPE=\"DNN_REGRESSOR\",\n",
    "    HIDDEN_UNITS=[64, 32],\n",
    "    BATCH_SIZE=32,\n",
    "    INPUT_LABEL_COLS=[\"weight_pounds\"],\n",
    "    DATA_SPLIT_METHOD=\"NO_SPLIT\") AS\n",
    "\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    babyweight.babyweight_data_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's first look at our training statistics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>training_run</th>\n",
       "      <th>iteration</th>\n",
       "      <th>loss</th>\n",
       "      <th>eval_loss</th>\n",
       "      <th>learning_rate</th>\n",
       "      <th>duration_ms</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0.681415</td>\n",
       "      <td>2.200278</td>\n",
       "      <td>0.1</td>\n",
       "      <td>147373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.970840</td>\n",
       "      <td>1.803952</td>\n",
       "      <td>0.1</td>\n",
       "      <td>148089</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.704043</td>\n",
       "      <td>1.594089</td>\n",
       "      <td>0.1</td>\n",
       "      <td>146653</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1.400467</td>\n",
       "      <td>2.130462</td>\n",
       "      <td>0.1</td>\n",
       "      <td>216121</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   training_run  iteration      loss  eval_loss  learning_rate  duration_ms\n",
       "0             0          3  0.681415   2.200278            0.1       147373\n",
       "1             0          2  0.970840   1.803952            0.1       148089\n",
       "2             0          1  0.704043   1.594089            0.1       146653\n",
       "3             0          0  1.400467   2.130462            0.1       216121"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT * FROM ML.TRAINING_INFO(MODEL babyweight.final_model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's evaluate our trained model on our eval dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mean_absolute_error</th>\n",
       "      <th>mean_squared_error</th>\n",
       "      <th>mean_squared_log_error</th>\n",
       "      <th>median_absolute_error</th>\n",
       "      <th>r2_score</th>\n",
       "      <th>explained_variance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>1.230722</td>\n",
       "      <td>2.346158</td>\n",
       "      <td>0.037347</td>\n",
       "      <td>1.055479</td>\n",
       "      <td>-0.34834</td>\n",
       "      <td>0.215212</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   mean_absolute_error  mean_squared_error  mean_squared_log_error  \\\n",
       "0             1.230722            2.346158                0.037347   \n",
       "\n",
       "   median_absolute_error  r2_score  explained_variance  \n",
       "0               1.055479  -0.34834            0.215212  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        *\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use our evaluation's `mean_squared_error` to calculate our model's RMSE."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>rmse</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>1.531717</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       rmse\n",
       "0  1.531717"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    SQRT(mean_squared_error) AS rmse\n",
    "FROM\n",
    "    ML.EVALUATE(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        *\n",
    "    FROM\n",
    "        babyweight.babyweight_data_eval\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "DIuFS56o_F2a"
   },
   "source": [
    "## Predict with final model\n",
    "\n",
    "\n",
    "Now that you have evaluated your model, the next step is to use it to predict the weight of a baby before it is born, using BigQuery `ML.PREDICT` function."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Predict from final model using an example from original dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "29cuS1CbADE3"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>predicted_weight_pounds</th>\n",
       "      <th>is_male</th>\n",
       "      <th>mother_age</th>\n",
       "      <th>plurality</th>\n",
       "      <th>gestation_weeks</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>3.675936</td>\n",
       "      <td>true</td>\n",
       "      <td>32</td>\n",
       "      <td>Twins(2)</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   predicted_weight_pounds is_male  mother_age plurality  gestation_weeks\n",
       "0                 3.675936    true          32  Twins(2)               30"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.PREDICT(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        \"true\" AS is_male,\n",
    "        32 AS mother_age,\n",
    "        \"Twins(2)\" AS plurality,\n",
    "        30 AS gestation_weeks\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modify above prediction query using example from simulated dataset\n",
    "\n",
    "Use the feature values you made up above, however set is_male to \"Unknown\" and plurality to \"Multiple(2+)\". This is simulating us not knowing the gender or the exact plurality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>predicted_weight_pounds</th>\n",
       "      <th>is_male</th>\n",
       "      <th>mother_age</th>\n",
       "      <th>plurality</th>\n",
       "      <th>gestation_weeks</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>3.477121</td>\n",
       "      <td>Unknown</td>\n",
       "      <td>32</td>\n",
       "      <td>Multiple(2+)</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   predicted_weight_pounds  is_male  mother_age     plurality  gestation_weeks\n",
       "0                 3.477121  Unknown          32  Multiple(2+)               30"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery\n",
    "SELECT\n",
    "    *\n",
    "FROM\n",
    "    ML.PREDICT(MODEL babyweight.final_model,\n",
    "    (\n",
    "    SELECT\n",
    "        \"Unknown\" AS is_male,\n",
    "        32 AS mother_age,\n",
    "        \"Multiple(2+)\" AS plurality,\n",
    "        30 AS gestation_weeks\n",
    "    ))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lab Summary: \n",
    "In this lab, we created and evaluated a DNN model using BigQuery ML, with and without feature engineering using BigQuery's ML.TRANSFORM and calculated predictions with BigQuery's ML.PREDICT."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2022 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "10.3.2019-DRAFT-_1 - FeatEnG - LABS.ipynb",
   "provenance": [],
   "toc_visible": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
